--- title: "English-paper" layout: paper permalink: /Graduationpaper-English/ author_profile: true ---
2020 년도
Bachelor’s Thesis
딥러닝 기반 귀내시경 영상을 이용한
중이 병증 진단
Deep learning-based middle ear disease diagnosis with
otoendoscopy images
2020
11 9
Soonchunhyang University
Department of Computer Science and Engineering
딥러닝 기반 귀내시경 영상을 이용한
중이 병증 진단
Deep learning-based middle ear disease diagnosis with
otoendoscopy images
지도교수
논문을 공학사학위 논문으로 제출함
2020
11 9
순천향대학교 공과대학 컴퓨터공학과
신지환 공학사학위논문을 인준함
2020 11 9
심사위원 천인국인
심사위원 남윤영인
순천향대학교 공과대학
컴퓨터공학과
I
This paper introduces a middle ear disease classification method based on the
middle ear image dataset through deep learning (CNN) and a web service that can
provide it. The image used in this study were provided by the Soonchunhyang
Cheonan hospital otorhinolaryngology. This study retrospectively investigated the
subject’s medical records after being reviewed by Institutional Review Board (IRB) of
Soonchunhyang Cheonan hospital (IRB No. SCHCA 2020-02-022). The categories were
divided into six classes: Normal, Traumatic Perforation, Otitis Media with Effusion (OME),
Congenital Cholesteatoma, Chronic Otitis Media (COM), and Acute Otitis Media (AOM).
After extracting the region of interest (ROI) from the dataset through the Mask R-CNN,
the dataset was reconstructed by using the various techniques such as Shift, Zoom, and
Rotate of augmentation and trained the dataset by using 5-Fold. The model was
composed of EfficientNetB0 and Inception-V3 through bagging. The accuracy of the
test prediction was 97.29% proceeded by test set. The dataset was superior to the
original dataset by 3.5%. Therefore, the configuring this into a web using Python Flask,
we built a site that take about 3 seconds to predict an image.
Keyword: CNN, EfficientNet-B0, Inception-V3, Mask R-CNN, Otitis media, Classification,
Web
II
ABSTRACT
This paper introduces the mechanism classifying the type of disease based on
image data sets produced by CNN. The images used in the study is given by
Soonchunhyang Cheonan hospital otorhinolaryngology. The study studied subject's
medical record retrospectively after being reviewed by IRB of Soonchunhyang Cheonan
hospital. The categories were classified by six classes: Normal, Traumatic Perforation,
Otitis media (OME), Congenital Cholesteatoma, Chronic otitis media (COM), Acute otitis
media (AOM). After applying Mask R -CNN to detect ROI (Region of Interest), the data
set was reconstructed by using the mechanism such as Shift, Zoom and Rotate of
augmentation and trained the data set by using 5-Fold. The model was constructed by
begging EfficientNet-B0 and Inception-V3. And the accuracy of the test prediction was
97.29% proceeded by test set. The data set was superior to the original data set by
3.5%. and therefore, it took approximately a second to construct the web site by using
Python Flask to predict images.
Keyword: CNN, EfficientNet-B0, Inception-V3, Mask R-CNN, Otitis Media, Classification,
Web
III
Table of Contents
Chapter 1 Introduction .................................................................................................................................... 1
Chapter 2 Prior Research and Related Technology Analysis ............................................................ 3
2.1 Middle ear disease classification ...................................................................................................... 3
2.2 Related Technology Case ..................................................................................................................... 5
Chapter 3 System Design ................................................................................................................................ 9
3.1 Development Environment ................................................................................................................. 9
3.2 Data set ..................................................................................................................................................... 10
3.3 Preprocessing ......................................................................................................................................... 11
3.3.1 Mask R-CNN ................................................................................................................................... 11
3.3.2 Data Increasing .............................................................................................................................. 14
3.4 Classification ............................................................................................................................................ 15
3.4.1 Classification Model Architecture ........................................................................................... 16
3.5 Web Server .............................................................................................................................................. 17
3.5.1 User Interface (UI) ......................................................................................................................... 18
Chapter 4 Experiments and Results .......................................................................................................... 21
4.1 Classification Experiment Result ..................................................................................................... 21
4.2 Comparison ............................................................................................................................................. 23
4.2 Web Execution Speed Experiment ................................................................................................ 24
Chapter 5 Conclusion and Future Work ................................................................................................. 25
Reference .............................................................................................................................................................. 26
IV
Table of Figure
[Figure 1] Deep Learning for Detection of Diabetic Eye Disease, 2016-11-29
.......................................... 5
[Figure 2] 뷰노 ......................................................................................................................................................... 6
[Figure 3] K-Fold cross-validation
........................................................................................................................ 8
[Figure 4] Eardrum images
................................................................................................................................. 10
[Figure 5] VGG Image Annotator
...................................................................................................................... 11
[Figure 6] Mask R-CNN result
............................................................................................................................ 11
[Figure 7] Eardrum region extraction result image
...................................................................................... 12
[Figure 8] Picture of no eardrum detected
..................................................................................................... 13
[Figure 9] Data processing flowchart ............................................................................................................... 15
[Figure 10] Web flowchart
................................................................................................................................. 17
[Figure 11 Initial screen
....................................................................................................................................... 18
[Figure 12] Member registration screen
......................................................................................................... 18
[Figure 13] Image upload screen
...................................................................................................................... 19
[Figure 14] Loading screen
................................................................................................................................. 19
[Figure 15] Results screen
.................................................................................................................................. 20
[Figure 16] Result graph
...................................................................................................................................... 23
V
Table of table
[Table 1] Development environment specification
......................................................................................... 9
[Table 2] Mask R-CNN eardrum detection result
.......................................................................................... 12
[Table 3] Dataset size
.......................................................................................................................................... 14
[Table 4] EfficientNet-B0 baseline network [19]
........................................................................................... 16
[Table 5] Inception-V3 baseline network [20]
............................................................................................... 16
[Table 6] Experimental results obtained from the original dataset
.......................................................... 21
[Table 7] Experimental results obtained through dataset reconstructed by storing only the
eardrum area
......................................................................................................................................................... 21
[Table 8] Confusion Matrix
................................................................................................................................. 22
[Table 9] Results of Precision, Recall, F1-score
.............................................................................................. 22
[Table 10] Processing speed experiment results
........................................................................................... 24
- 1 -
Chapter1Introduction
Otitis media is a common disease in which 2 out of 3 people get sick at least
once in 3 years and younger, and 3 out of 3 get sick at least 3 times. Although, the
incidence rate of acute otitis media in Korea is not accurate, according to foreign
reports, 62% of patients by the age of 1 and 83% of the age of 3 have at least once. In
Korea, a nationwide study on the incidence of otitis media has reported a prevalence of
0.08% of acute otitis media and 1.22% of exudative otitis media in subjects under 15
years of age (Kimet al., 1993). According to the 2008 National Health Insurance Review
& Assessment Service’s statistics [1], middle ear infections rank 10th in the frequency
of visits doctors by patients under 10 years of age and 6th in the frequency of visits
hospital and clinics. Instead of other upper respiratory infections, otitis media in infants
is said to be accompanied by complication and sequelae that may occur when not
properly treated with professional medical knowledge.
In this paper, in order to prevent otitis media and complications and sequelae,
which are common in infants, we introduced a deep learning-based web service for
early diagnosis. This study was conducted in cooperation with Soonchunhyang
Cheonan hospital otorhinolaryngology, and this study retrospectively investigated the
medical records of subjects after deliberation by Institutional review board (IRB) of
Soonchunhyang University hospital (IRB No. SCHCA 2020-02-022). The disease was
divided into six categories: Normal, AOM, COM, Congenital Cholesteatoma, OME, and
Traumatic Perforation, and the study was conducted with 4808 datasets. In addition, the
accuracy was improved by adding the ensemble model and the MASK R-CNN model,
which extracted the tympanic region through preprocessing, rather than a single CNN
model. Through this, we propose that a preprocessing process is required for a
relatively small dataset, improve the accuracy of the ensemble model, and introduce a
web service that takes less than 3 seconds for diagnosis.
- 2 -
The composition of this paper introduces the related research and technology
analysis in Chapter 2. Chapter 3 explains system design and classification model
through deep learning, and introduces research methods and web services. And
Chapter 4 shows the conclusion.
- 3 -
Chapter2PriorResearchandRelatedTechnologyAnalysis
2.1Middleeardiseaseclassification
The middle ear infection is characterized by Traumatic perforation, Otitis media
with effusion (OME), Congenital cholesteatoma (CC), Congenital cholesteatoma (COM),
Acute otitis media (AOM) [2].
Perforation due to trauma is a type of otitis media that can cause pain, bleeding,
hearing loss, tinnitus, and dizziness, something is inserted into the middle ear,
explosion, slap, head trauma, sudden decompression, air pressure disorder, etc.
Perforation in which organs inside the middle ear are visible and bleeding caused by
this perforation are diagnosed through an otoscope [3].
Exudative otitis media refers to middle ear infections caused by bacterial or viral
infections and its characterized by exudate. OME occurs frequently in infants and
young children, and the eardrum expands due to the exudate, or increase in pressure
in the eardrum, resulting in a hole in the eardrum, and exudate flows through it [4].
Congenital ear abnormalities mean no ears, are not formed, or incompletely
developed at the time of childbirth, such as hearing impairment due to lack of
eardrums and absence of external earring, etc. In this paper, only less eardrum
formation was considered [5].
COM means purulent otitis media, traumatic perforation lasting more than 6
weeks. This type of otitis media includes non-painful otorrhea and conductive hearing
impairment, and includes other infections such as polyps and limbs. In this case, a
diagnosis using CT and MRI is required to observe and diagnose otorrhea [6].
AOM refers to the case of sudden inflammation in the eardrum, and rapid cycles
of pain, fever, and fluid occur. In infants and toddlers, there are reactions such as crying,
irritability, lack of sleep, and reluctance to care appear. In this case, the eardrum
- 4 -
changes to red and yellow, and this is diagnosed through an otoscope. Treatment is
done through procedures and antibiotics for adults, but only by antibiotics for infants
and toddlers [7].
- 5 -
2.2RelatedTechnologyCase
1) Google Fundus Image Study
Numerous studies using deep learning for DR diagnosis have been published in
the global medical journals JAMA, Ophthalmology, and Nature [8]-[10], [13]. The first
case of publishing the results of a study that applied deep learning to read DR in a
medical journal is a research paper published by Google in JAMA in 2016 [8]. The
performance presented in the paper was 0.99 of AUC, which was comparable to that of
a professional ophthalmologist.
[Figure 1] Deep Learning for Detection of Diabetic Eye Disease, 2016-11-29
Google conducted a machine learning by inputting 120,000 fundus images into
InceptionV3, one of the deep neural networks, and the trained model was tested with
8,788 EyePACS-1 data [10] and 1,745 Messidor-2 data [12]. The paper published by
Google provides detailed data on image quality and labeling quality. However, the
EyePACS-1 database shows that data samples were taken from both training and
testing, and it is regrettable that there is no mention that samples were taken without
overlapping each other. In addition, the fact that the distribution of normal and disease
is large in the composition of the test data seems to be a matter to consider in the
interpretation of the performance evaluation results.
- 6 -
2) Medical Artificial Intelligence Diagnostic Software
Vuno [13] is a medical artificial intelligence solution development company that
analyzes various medical data and assists disease diagnosis by applying artificial
intelligence (AI) technology to the medical field, and assists in reading chest X-ray
images and chest CT images, and eye disease. It is expanding the field of artificial
intelligence medical care through the development and commercialization of medical
personnel know intelligence solutions for reading assistance. [Fig. 2] below is a picture
of an actual application example of Vuno.
[Figure 2] Vuno
- 7 -
3) Flask
Flask [14] is one of the micro web frameworks written in Python, based on the
Werkzeug toolkit and Jinja2 template engine. It is a BSD license. It is called a micro
framework because it does not require any special tools or libraries. There are no
database abstraction layers, form validation, and other components that existing third-
party libraries provide common functions. However, Flask supports extensions that
allow to add application functionality as if implemented in Flask itself. Extensions exist
for object relational mappers, form validation, upload management, various open
authentication technologies, and tools related to several common frameworks.
In this paper, Flask was used to execute Python code on the deep learning
server on the web.
- 8 -
4) K-Fold Cross Validation
K-Fold cross-validation [15] is one method of evaluating models in statistics.
The problem of evaluating model performance using a part of the entire data called
Held-Out Validation as a validation set is the problem of evaluating the performance
of the test set when the size of the dataset is small. If the performance is different
depending on how the test set is held, the model evaluation index is biased due to
the effect of chance.
To solve this problem, K-Fold cross-validation ensures that all data is used as
a test set at least once. In the figure below, the data is divided into 5 pieces and the
test set is changed each time. In the first iteration, BCDE is set as Train set and A is
set as Test set, and performance is evaluated. In the second iteration, performance is
evaluated with ACDE as Train set and B as Test set. Then, there are a total of 5
performance evaluation indicators, which are usually averaged to evaluate the
performance of the model. At this time, how many data is divided into K-Fold cross-
validation K. In this paper, it is composed of 5-fold.
[Figure 3] K-Fold cross-validation
- 9 -
Chapter3SystemDesign
3.1DevelopmentEnvironment
The study was conducted with the specifications show in [Table 1] below. Epoch
was set to 1000, and training was stopped at the point where Validation Loss
continued to increase during Epoch 20 through the Early Stopping technique, applied
to VGG19, Resnet50, Resnet101, Resnet152, InceptionV3, EfficientNetB0 models and
compared. The total time was 70 hours.
[Table 1] Development environment specification
GPU
NVIDIA TITAN RTX (3EA)
RAM
192GB
CPU
Intel(R) Xeon(R) Silver 4114 CPU @ 2.2GHz (20EA)
OS
Window 10 Education 64bit
Part
Specification
- 10 -
3.2Dataset
The study was conducted in cooperation with Soonchunhyang University
Cheonan Hospital, and the total number of images provided by the hospital was 4808
which contained the total six categories: Normal, Traumatic Perforation, AOM, COM,
Congenital Cholesteatoma, and OME. [Fig. 4] below is an image of the eardrum of each
label.
[Figure 4] Eardrum images
a) Normal b) Traumatic Perforation c) AOM
a) Normal b) Traumatic Perforation c) AOM
- 11 -
3.3Preprocessing
3.3.1MaskR-CNN
If the eardrum was covered by earwax or other foreign matter in the eardrum image, it
led to a decrease in accuracy during deep learning. Since the number of datasets is
small, we aimed to obtain a high-quality image as much as possible, and the eardrum
part was extracted from the image and visualized through Mask R-CNN [16]. Before
training, annotation processing was performed using the VGG Image Annotator (VIA)
Tool [17] as shown in [Fig. 5] below.
[Figure 5] VGG Image Annotator
The train set was 309 pieces and the validation set was 101 pieces, and [Fig. 6] is
the result. In the validation set, the tympanic membrane detection accuracy was 99%.
[Figure 6] Mask R-CNN result
- 12 -
In test size, it was applied to 4808 datasets provided by Soonchunhyang
University Cheonan Hospital. At this time, the detection rate was 84.63%, and [Table 2]
is the result. The reason for the low detection rate is that the Mask R-CNN training and
verification set were annotated with the tympanic image data that were normally
diagnosed. As a problem, the image quality of the disease dataset was relatively lower
than that of the image dataset with normal diagnosis. Here, quality refers to a case in
which the brightness of the eardrum area of the image is dark, the image quality is low,
so that the eardrum cannot be confirmed by the naked eye, or earwax covers the
eardrum. The dataset was reconstructed by storing only 4254 eardrum regions
obtained by applying to the test set.
[Table 2] Mask R-CNN eardrum detection result
Total
4,808
4,254
OME
400
324
Congenital Cholesteatoma
384
312
COM
824
698
AOM
804
654
Traumatic Perforation
356
295
Normal
2,040
1,971
Original
Eardrum extraction
[Fig. 7] below visualizes the detection result through Mask R-CNN.
[Figure 7] Eardrum region extraction result image
- 13 -
Filtering effect can be obtained when the image quality is poor and the eardrum
is covered with earwax through Mask R-CNN. [Fig. 8] below are picture of the eardrum
that was not detected through Mask R-CNN. Through this, the dataset was
automatically refined, and as a result, it was possible to remove the noise part at a
remarkably fast speed and extract only the region of interest (ROI).
[Figure 8] Picture of no eardrum detected
- 14 -
3.3.2DataIncreasing
Previously, the eardrum region was extracted through the Mask R-CNN model in
order to refine the dataset, extract the region of interest, and enhance the training
effect. In addition, using the Python module OpenCV, the CLAHE (Contrast Limited
Adaptive Histogram Equalization) algorithm was applied to remove noise from the data,
and the image size was changed to 384×384. And the RGB value was divided by 255.
Finally, data was increased by using various techniques such as rotation, enlargement,
vertical inversion, and brightness control. Data increase was aimed at preventing
overfitting.
[Table 3] Dataset size
Total
4,808
34,590
4,254
34,170
OME
400
5,765
324
5,695
Congenital Cholesteatoma
384
5,765
312
5,695
COM
824
5,765
698
5,695
AOM
804
5,765
654
5,695
Traumatic Perforation
356
5,765
295
5,695
Normal
2,040
5,765
1,971
5,695
Original increase (sheet)
Eardrum extraction increase
(intestinal)
[Table 3] is a table showing the size of the dataset. And the ratio of training set,
validation set, and test set was divided by 3:1:1, and data was divided into 5-fold for K-
Fold cross-validation. We can configure more folds, but 5-fold was the most
appropriate in terms of time required and probability.
- 15 -
3.4Classification
For classification, Tensorflow and Keras (Python) modules were used, and
Adadelta was used as an optimizer, and through the Early Stopping method, Validation
Loss was set to stop when it did not decrease for 20 times during training. This is to
prevent overfitting and to obtain high accuracy.
Using the EfficientNet-B0, Inception-V3, and the proposed ensemble model, the
original photographic dataset and the dataset reconstructed by extracting the eardrum
region through the Mask R-CNN model were divided into two experiments. [Fig. 9] is a
data processing flowchart.
[Figure 9] Data processing flowchart
- 16 -
3.4.1ClassificationModelArchitecture
The model proposed in this paper is composed of EfficientNet-B0 and Inception-
V3 models through bagging [18], [Table 4] and [Table 5] below are the architectures of
each model.
[Table 4] EfficientNet-B0 baseline network [19]
9
Conv 1x1 & Pooling & FC
7×7
1280
1
8
MBConv6, k3x3
7×7
320
1
7
MBConv6, k5x5
14×14
192
4
6
MBConv6, k5x5
14×14
112
3
5
MBConv6, k3x3
28×28
80
3
4
MBConv6, k5x5
56×56
40
2
3
MBConv6, k3x3
112×112
24
2
2
MBConv1, k3x3
112×112
16
1
1
Conv3x3
224×224
32
1
Stage
Operator
Resolution
Channels
Layers
[Table 5] Inception-V3 baseline network [20]
Softmax
Classifier
1 × 1 × 1000
Linear
Logits
1 × 1 × 2048
Pool
3×3/1
8 × 8 × 2048
2 Inception
8 × 8 × 1280
5 Inception
17 × 17 × 768
3 Inception
Reference [20]
35 × 35 × 288
Conv
3×3/1
35 × 35 × 192
Conv
3×3/2
71 × 71 × 80
Conv
3×3/1
73 × 73 × 64
Pool
3×3/2
149 × 149 × 64
Conv
3×3/1
149 × 149 × 32
Conv
3×3/1
149 × 149 × 32
Conv
3×3/2
299 × 299 × 3
Type
Patch size/Stride or Remarks
Input Size
- 17 -
3.5WebServer
The module used in the web server is Flask, and through this, deep learning
results within Python could be easily passed to the front end without using a separate
socket. In addition, routing was performed through the module’s own method, and
external access to the web was made possible through port forwarding, and the
following [Fig. 10] is a flowchart of a web page. It consists of a total of 5 HTML
documents.
[Figure 10] Web flowchart
- 18 -
3.5.1UserInterface(UI)
1) Initial Screen
[Fig. 11] below is the initial screen, and you can log in and register as a member.
When logging in, the server analyzes the DB and checks the validity of the ID and
password.
[Figure 11 Initial screen
2) Member Registration Screen
[Fig. 12] below is the membership registration screen, and if the conditions are
satisfied after the duplicate check of the email address and password, it is saved in the
DB.
[Figure 12] Member registration screen
- 19 -
3) Upload Screen
[Fig. 13] below is the step of uploading a symptom picture, and when the submit
button is clicked, the test starts with the image received from the server. In this process,
the larger the size of the image uploaded, the longer it takes.
[Figure 13] Image upload screen
4) Processing screen
[Fig. 14] below is an HTML document loaded during the time required for deep
learning. When using the Tensorflow-GPU version, it took about 8 seconds to allocate
the GPU, but when testing with the CPU version, it took about 0.8 seconds.
[Figure 14] Loading screen
- 20 -
5) Result Screen
[Fig. 15] below is a screen that displays the result. This step is to output the
result on the web when the result is obtained from the server. It takes about 0.8
seconds to upload from 3) to 5) result screen and test time. The smaller the image size,
the higher the response speed.
[Figure 15] Results screen
- 21 -
Chapter4ExperimentsandResults
4.1ClassificationExperimentResult
The experiment was conducted with two datasets, the original dataset and the
dataset previously refined through Mask R-CNN. The models used for classification
were VGG19, Resnet50, Resnet101, Resnet152, Inception-V3, and EfficientNet-B0. The
optimizer of all models was Adadalta, the loss function was Sparse Categorical
Crossentropy [21], and the training rate was set to 0.13. And the two tables below
show the experimental results.
[Table 6] Experimental results obtained from the original dataset
Proposal Model
93.23%
92.46%
93.41%
93.86%
92.56%
112.7M
EfficientNet-B0
92.87%
90.83%
92.49%
91.14%
90.72%
20.9M
Inception-V3
89.41%
90.27%
90.69%
89.89%
89.43%
91.8M
Resnet152
89.71%
88.43%
88.49%
89.11%
89.86%
223.0M
Resnet101
88.99%
89.37%
89.72%
89.11%
88.45%
163.0M
Resnet50
90.53%
90.29%
90.28%
89.33%
89.12%
90.3M
VGG19
84.35%
85.21%
85.39%
86.01%
84.56%
1304.0M
KF0
KF1
KF2
KF3
KF4
Params
[Table 7] Experimental results obtained through dataset reconstructed by storing only
the eardrum area
Proposal Model
96.86%
96.03%
96.77%
97.29%
97.01%
112.7M
EfficientNet-B0
95.13%
95.76%
95.93%
96.11%
95.10%
20.9M
Inception-V3
94.01%
94.99%
95.18%
94.50%
93.68%
91.8M
Resnet152
90.12%
89.55%
89.49%
89.18%
88.57%
223.0M
Resnet101
91.93%
90.16%
89.89%
90.81%
91.12%
163.0M
Resnet50
93.27%
92.59%
93.23%
92.48%
91.61%
90.3M
VGG19
85.13%
85.43%
86.43%
86.23%
85.01%
1304.0M
KF0
KF1
KF2
KF3
KF4
Params
- 22 -
In addition, from the weights obtained by training with the proposed model on
the reconstructed dataset by storing only the eardrum region, the result obtained with
the highest accuracy, through the test set, [Table 8] shows the Confusion Matrix, [Table
9] is a table showing the corresponding Precision, Recall, and F1-Score.
[Table 8] Confusion Matrix of Best Score
OME
0
0
0
0
0
1139
C_C
0
0
0
64
1075
0
COM
0
0
0
1139
0
0
AOM
46
0
1082
7
2
2
TP
0
1088
0
51
0
0
Actual
Normal
1126
0
6
0
1
6
Normal
TP
AOM
COM
C_C
OME
Predict
[Table 9] Results of Precision, Recall, F1-score
OME
99.303
100
99
CC
99.722
94.381
97
COM
90.325
100
95
AOM
99.449
94.996
97
TP
100
95.522
98
NR
96.075
98.859
97
Precision (%)
Recall (%)
F1-Score (%)
- 23 -
4.2Comparison
[Figure 16] Result graph
[Fig. 16] above shows the graph with the highest accuracy among the 5 folds. As
a result of the above experiment, it was found that the accuracy of the ensemble
model is higher than that of the single model, and that when the dataset is limited, the
accuracy is improved when the data is processed and reconstructed to be classified.
- 24 -
4.2WebExecutionSpeedExperiment
The experiment was conducted from the time the image was uploaded to the
time when the result was derived. [Table 10] is a table showing the results of the
treatment speed experiment. The experimental method is to check the transmitted file,
check the transmitted file name, move the transmitted file to the designated directory,
check the duplicate transmitted file, transmit it to the Soonchunhyang University ICT
deep learning server, predict, and receive the result. The time required was combined
and measured. The number of times was 20 times, and the size of the image was 1MB
to 5MB. The result is obtained by adding the predicted time and the time the image is
uploaded and transmitted and received. If the image size is large, there is a
disadvantage in that the time to obtain the result is longer.
[Table 10] Processing speed experiment results
Average (sec)
2.116
2.231
2.551
2.436
2.590
1MB
2MB
3MB
4MB
5MB
- 25 -
Chapter5ConclusionandFutureWork
In the reality that is rapidly evolving in line with the 4th industrial revolution, the
medical system is also evolving in a smart way, and in this paper, a middle ear disease
diagnosis model using deep learning was proposed. Two models instead of a single
model were reconstructed through the bagging method, and the dataset was
reconstructed and trained by extracting the region of interest from the original image.
Through this, the accuracy was improved, and the highest accuracy was 97.23%. And by
building a web server, it was possible to diagnose middle ear disease within 3 seconds.
However, it has a disadvantage that it takes a long time to upload a high-quality image
because the time to upload an image is added. This should be designed so that the
capacity is not too large by processing at the hardware level such as the eunuch.
Since it is a system for diagnosing diseases, more accurate and precise results
must be obtained. Therefore, it will be possible to solve this problem by repeating
training and verification with hospitals. In addition, the Mask R-CNN is tuned more
precisely or the accuracy is increased by adding a dataset, and an algorithm that
detects earwax that is too large to cover the eardrum is added so that the user can
obtain a more accurate diagnosis. You will have to develop it as a program. In the
future, performance and efficiency are proven through systematic clinical trials through
the intervention of medical experts, and if the medical device and the relevant
technology are combined through this, it will be able to develop into a professional
medical device capable of diagnosing disease in urgent situations.
- 26 -
Reference
[1] 국제 데이터 주식 회사 2014, www.idc.com/
[2]https://www.msdmanuals.com/en-kr/home/ear,-nose,-and-throat-disorders/bi ology-
of-the-ears,-nose,-and-throat/ears
[3]https://www.msdmanuals.com/en-kr/professional/ear,-nose,-and-throat-
disorders/middle-ear-and-tympanic-membrane-disorders/traumatic-perforation-of-the-
tympanic-membrane
[4]https://www.msdmanuals.com/en-kr/professional/ear,-nose,-and-throat-
disorders/middle-ear-and-tympanic-membrane-disorders/otitis-media-acute
[5]https://www.msdmanuals.com/en-kr/professional/pediatrics/congenital-craniof acial-
and-musculoskeletal-abnormalities/congenital-ear-abnormalities
[6]https://www.msdmanuals.com/en-kr/professional/ear,-nose,-and-throat-disor
ders/middle-ear-and-tympanic-membrane-disorders/otitis-media-chronic
[7]https://medicalguidelines.msf.org/viewport/CG/english/acute-otitis-media-aom
-16689234.html
[8] Gulshan V, Peng L, Coram M, et al. Development and validation of a
deep learning algorithm
for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016.
[9] Krause, Jonathan, et al. Grader variability and the importance of reference standards
for
evaluating machine learning models for diabetic retinopathy. Ophthalmology, 2018.
[10] Abramoff, Michael D., et al. Pivotal trial of an autonomous AI-based
diagnostic system for detection of diabetic retinopathy in primary care offices. npj
Digital Medicine,
2018, 1.1: 39.
[10] EyePACS, http://www.eyepacs.com/, access date: 2018. 10. 21
[11] Van Der Heijden, Amber A., et al. Validation of automated screening for referable
diabetic
retinopathy with the IDx DR device in the Hoorn Diabetes Care System. Acta
- 27 -
ophthalmologica, 2018, 96.1: 63-68.
[12] Messidor-2 Decenciere et al.. Feedback on a publicly distributed database: the
Messidor
database. Image Analysis & Stereology, V.33, N.3, pp.231-234, aug. 2014. ISSN
1854-5165.
[13] vuno, https://www.vuno.co/static/pdf/BoneAge.pdf
[14] https://flask.palletsprojects.com/en/1.1.x/
[15] April 2010IEEE, Transactions on Pattern Analysis and Machine
Intelligence 32(3):569 575
[16] Mask R-CNN Kaiming He Georgia Gkioxari Piotr Dollar Ross Girshick Facebook AI
Research (FAIR), https://arxiv.org/pdf/1703.06870.pdf
[17] VGG Image Annotator, http://www.robots.ox.ac.uk/~vgg/software/via/
[18] Bagging, Boosting and Ensemble Methods-Peter Bühlmann January 2012,
https://www.researchgate.net/publication/45130375_Bagging_Boosting_and_Ense
mble_Methods
[19] EfficicentNet, https://arxiv.org/abs/1905.11946
[20] Inception-V3, Rethinking the Inception Architecture for Computer Vision,
https://www.cv-
foundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inceptio
n_CVPR_2016_paper.pdf
[21] Sparse Categorical Crossentropy,
https://cwiki.apache.org/confluence/display/MXNET/Multi-hot+Sparse+Categoric
al+Cross-entropy